Skip to content

ci(bench): add per-PR perf canary for extractor/graph/native changes#1488

Merged
carlos-alm merged 11 commits into
mainfrom
fix/bench-pr-canary-1433
Jun 13, 2026
Merged

ci(bench): add per-PR perf canary for extractor/graph/native changes#1488
carlos-alm merged 11 commits into
mainfrom
fix/bench-pr-canary-1433

Conversation

@carlos-alm

Copy link
Copy Markdown
Contributor

Summary

  • Adds .github/workflows/perf-canary.yml — a path-filtered workflow that fires only on PRs touching src/extractors/**, src/domain/graph/**, crates/**, or the benchmark scripts themselves
  • Runs the incremental-benchmark suite only (full build + no-op + 1-file rebuild, both engines) — completes in ~5–10 minutes vs 20–60 for the full suite
  • Gates on 50% regression threshold via new BENCH_CANARY=1 mode in the regression guard

Prevents the class of regressions that accumulated invisibly across the Phase 8.x PRs and were only caught at v3.12.0 publish time (+98% native full build, +1827% native 1-file rebuild). The canary would have caught both within 10 minutes of the first regressing PR landing.

Changes:

tests/benchmarks/regression-guard.test.ts — adds BENCH_CANARY mode:

  • Raises thresholds: 50% standard, 100% noisy-metrics, 150% WASM timing
  • Skips build/query/resolution describe blocks (only incremental runs)
  • Skips the stale-KNOWN_REGRESSIONS check (irrelevant for a build gate)

.github/workflows/perf-canary.yml — new workflow:

  • Triggers on pull_request with path filters for extractors/graph/crates
  • Builds the native addon from source (same as ci.yml)
  • Runs incremental-benchmark.ts --dist, updates the incremental report, then invokes the regression guard with BENCH_CANARY=1

Closes #1433

… diff

Adds snapshot-pre-bash.sh (PreToolUse Bash) + track-bash-writes.sh
(PostToolUse Bash): the pre-hook captures git status --porcelain to a
per-worktree temp file before each Bash call; the post-hook diffs the
before/after state and appends newly modified or created files to
.claude/session-edits.log.

This closes the gap where files written by sed -i, printf redirects,
tee, heredocs, or build tools (Cargo.lock, lockfiles) were never
recorded, causing guard-git.sh to emit false-positive BLOCKED errors.

Closes #1457
- clojure.rs: annotate lifetime-anchor assignment to silence false-positive
- cfg.rs: remove never-called start_line_of method
- complexity.rs: remove never-constructed NotHandled variant; convert
  irrefutable if-let patterns to plain let destructures
- dataflow.rs: remove never-read callee fields from CallReturn/Destructured
- incremental.rs: remove never-read lang field from CacheEntry

cargo check and cargo clippy both clean after these changes.
Adds .github/workflows/perf-canary.yml — a path-filtered workflow that
fires on PRs touching src/extractors/, src/domain/graph/, or crates/**
and runs only the incremental-benchmark suite (full build + no-op +
1-file rebuild, both engines). Catches the class of regressions that
accumulated invisibly across the Phase 8.x PRs and were only detected
at v3.12.0 publish time.

The regression guard gains BENCH_CANARY=1 mode: raises thresholds to
50%/100%/150% (standard/noisy/WASM) and skips the build, query, and
resolution suites — only incremental checks run. This absorbs shared-
runner timing variance while still blocking catastrophic regressions
(+98% full build, +1827% 1-file rebuild from v3.12.0).

Closes #1433
@claude

claude Bot commented Jun 13, 2026

Copy link
Copy Markdown

Claude finished @carlos-alm's task —— View job


PR Review: ci(bench): add per-PR perf canary for extractor/graph/native changes

  • Gather context and read changed files
  • Phase 1: Root Cause Analysis
  • Phase 2: Configuration Change Detection
  • Phase 3: Test Integrity
  • Phase 4: Code Quality
  • Phase 5: Backlog Compliance
  • Final Assessment

Root Cause Analysis

Problem Being Solved: Performance regressions in the extractor/native layers accumulated undetected across the Phase 8.x PR series and were only caught at v3.12.0 publish time (+98% native full build, +1827% native 1-file rebuild). The full pre-publish benchmark suite (20–60 min) is too slow to gate individual PRs. This PR adds a lightweight, path-filtered canary that runs only the incremental benchmark tier (5–10 min) on PRs that touch the relevant layers.

Why This Approach: Targeted path filters + reduced benchmark scope is the correct design — run expensive checks only where regressions are most likely, keep the signal tight. The BENCH_CANARY mode in the regression guard is a clean toggle that reuses the existing guard infrastructure rather than duplicating logic.

Risk Assessment: Low. Thresholds are raised, not lowered — this is explicitly a coarser gate that accepts more CI-noise in exchange for fast feedback, not a mechanism to hide real regressions. The safety argument (native engine catches real regressions; WASM only needs gross-catastrophe detection) is correctly stated in the comments.


Issues Found

1. has resolve data to compare assertion runs in canary mode — will fail on a fresh runner (Medium)

tests/benchmarks/regression-guard.test.ts:827-832

test('has resolve data to compare', () => {
  expect(
    resolvePair != null,
    'No import-resolution benchmark data with ≥2 comparable entries',
  ).toBe(true);
});

This test is inside describe('incremental benchmarks') which is not skipped in canary mode. But resolvePair is derived from incrementalHistory with a .filter(e => e.resolve != null) — and the canary workflow writes a fresh incremental-canary-result.json that contains exactly one dev entry. If there is no prior dev-or-release entry with resolve data in the committed history file, resolvePair will be null and this assertion will fail in canary mode.

Similarly, test('has at least one engine to compare') at line 818 makes the same assumption that there are ≥2 incremental entries, which is guaranteed in committed history but only if the dev entry isn't the first. Worth auditing whether the canary's updated report always has a second comparable entry before this gate runs.

2. Missing node-version pin — node: "22" versus node: 22 inconsistency (Minor)

perf-canary.yml:46

node-version: "22"

ci.yml uses node-version: 22 (integer, no quotes). This is functionally equivalent with actions/setup-node, but inconsistent with the rest of the repo. Minor, but worth aligning for readability.

3. actions/checkout@v6 with fetch-depth: 0 — not needed for this workflow (Minor)

perf-canary.yml:40-42

- uses: actions/checkout@v6
  with:
    fetch-depth: 0

fetch-depth: 0 fetches the full history. The canary doesn't need git history for comparisons — it compares against committed benchmark data in the repo, not against git refs. ci.yml and benchmark.yml use the default shallow clone. This wastes time on large repos and is unnecessary. The fetch-depth: 0 was likely copied from a workflow that does git log or branch comparisons; it should be removed here.

4. update-incremental-report step runs before regression guard — creates a misleading artifact on failure (Minor)

perf-canary.yml:94-103

- name: Update incremental report
  run: |
    ...
    node $STRIP_FLAG scripts/update-incremental-report.ts incremental-canary-result.json

- name: Regression guard (50% threshold)
  env:
    RUN_REGRESSION_GUARD: "1"
    BENCH_CANARY: "1"
  run: npm run test:regression-guard

The report is updated (and presumably written to disk) before the regression guard runs. If the canary guard fails, the updated report file with the regressed dev entry is left behind. Since the artifact is always uploaded (if: always()), this is fine for debugging. But if a subsequent step (or a concurrent job) picked up that file, it could persist bad data. For a pure CI artifact this is fine — just worth noting it's not committed, so there's no persistence risk.

5. No permissions block — relies on workflow-level defaults (Informational)

perf-canary.yml — entire file

The workflow has no explicit permissions stanza. GitHub's default for pull_request events from forks is read-only for most tokens, so this likely works fine. However, ci.yml also omits explicit permissions, so this is consistent with the repo convention. Just noting it for awareness.


Backlog Compliance

  • Zero-dep: ✓ — No new runtime dependencies. Uses existing Rust/napi-rs/vitest infrastructure.
  • Foundation-aligned: ✓ — Pure CI infrastructure. No cloud API calls, no new runtime toolchain. Doesn't modify the programmatic API.
  • Problem-fit: 3/5 — Correct and necessary CI infrastructure, but doesn't directly address the "lost AI" navigation problem. It's maintenance/quality work. Per the backlog framing: "primarily optimizes for CI/CD" is deprioritized by P8, but preventing a +1827% regression from shipping is a real reliability concern. Call it 3 on the 1–5 scale — useful, not an agent-accuracy feature.
  • Breaking: No — purely additive CI workflow + test flag.
  • Tier: 1 (zero-dep ✓, foundation-aligned ✓)

Critical Concerns

The has resolve data to compare test (issue #1 above) is the only one that could cause a spurious canary failure. The fix is straightforward: add test.skipIf(BENCH_CANARY)(...) around that assertion (since the canary's incremental run may not produce resolve data), matching the pattern used for has at least one engine to compare... actually, wait — that test is also not skipped. Both sentinel tests inside describe('incremental benchmarks') assume committed history has ≥2 comparable entries. If the canary writes the first dev entry to a fresh report file (not to the committed one), the comparison data exists in committed history and both tests pass. Verify the update step appends to the committed INCREMENTAL-BENCHMARKS.md, not to incremental-canary-result.json. If it does, the concern is mitigated.


Final Recommendation

  • Rating: ⭐⭐⭐⭐ (4/5)
  • Action: APPROVE WITH CONDITIONS
  • Reasoning: The PR solves a real, documented problem with a well-scoped design. The BENCH_CANARY flag is clean, the threshold adjustments are thoroughly documented with empirical data (CI run numbers, percentage swings, root-cause attribution), and the workflow structure follows established repo conventions. The one actionable issue is the fetch-depth: 0 (remove it — it's wasteful). The has resolve data to compare concern should be confirmed to be safe before merge (i.e., verify the update script appends to committed history, not just the canary JSON). Everything else is minor polish.

Suggested fix before merging:

  1. Remove fetch-depth: 0 from the checkout step in perf-canary.yml — not needed.
  2. Confirm scripts/update-incremental-report.ts appends the dev entry to generated/benchmarks/INCREMENTAL-BENCHMARKS.md (committed file), so findLatestPair has ≥2 entries when the regression guard runs. If it doesn't, the has resolve data to compare and has at least one engine to compare tests will fail on a first-run canary.

@greptile-apps

greptile-apps Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Adds a lightweight per-PR performance canary workflow (.github/workflows/perf-canary.yml) that fires only on changes to the extractor, graph, and native crate layers, and extends regression-guard.test.ts with a BENCH_CANARY=1 mode that runs only incremental benchmarks at relaxed thresholds (50% standard, 100% noisy, 150% WASM).

  • New workflow builds the native addon from source, runs incremental-benchmark.ts --dist, updates the incremental report, and invokes the regression guard with BENCH_CANARY=1; path filters (including regression-guard.test.ts and update-incremental-report.ts) ensure the canary itself triggers the gate when threshold logic changes.
  • BENCH_CANARY mode in the test file uses describe.skipIf / test.skipIf to skip build, query, resolution, and stale-exemption checks, while keeping all incremental-benchmark comparisons (plus the 'has at least one engine to compare' and 'has resolve data to compare' guard tests) active so the canary cannot silently pass with no data.

Confidence Score: 5/5

Safe to merge — adds a new workflow and expands a test file with no changes to production code paths.

Both changed files are additive: a new workflow that only runs benchmarks and a test file extension that gates on an environment variable. The threshold values and skip logic are well-documented, internally consistent, and validated by guard tests that prevent silent no-op passes. Action versions and workflow conventions are consistent with the rest of the repo. Previous review feedback has been addressed.

No files require special attention.

Important Files Changed

Filename Overview
.github/workflows/perf-canary.yml New workflow: path-filtered PR gate that builds native addon, runs incremental benchmark, and invokes regression guard in BENCH_CANARY mode; consistent with repo conventions (action versions, concurrency, permissions: {}, retry loop).
tests/benchmarks/regression-guard.test.ts Adds BENCH_CANARY constant and threshold overrides (0.5/1.0/1.5), skips non-incremental describe blocks and maintenance checks in canary mode; incremental benchmarks block and its guard tests remain unskipped as intended.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    PR[Pull Request push] --> PF{Path filter matches?\nsrc/extractors/**\nsrc/domain/graph/**\ncrates/**\nscripts/benchmark*.ts\ntests/benchmarks/regression-guard.test.ts}
    PF -->|No| SKIP[Workflow skipped]
    PF -->|Yes| CHECKOUT[checkout + setup Node 22 + Rust]
    CHECKOUT --> BUILD_NATIVE[napi build --release\ncrates/codegraph-core]
    BUILD_NATIVE --> NPM[npm install with retry loop]
    NPM --> CI_NATIVE[ci-install-native.mjs\noverlay local addon over published binary]
    CI_NATIVE --> TS_BUILD[npm run build\nbuild TypeScript dist/]
    TS_BUILD --> BENCH[incremental-benchmark.ts --version dev --dist\n→ incremental-canary-result.json]
    BENCH --> UPDATE[update-incremental-report.ts\nmutates INCREMENTAL-BENCHMARKS.md locally]
    UPDATE --> GUARD{npm run test:regression-guard\nRUN_REGRESSION_GUARD=1\nBENCH_CANARY=1}
    GUARD -->|native Full build >50%\nor 1-file rebuild >100%| FAIL[❌ Job fails — regression detected]
    GUARD -->|WASM timing >150%| FAIL
    GUARD -->|All within thresholds| PASS[✅ Job passes]
    PASS --> UPLOAD[Upload incremental-canary-result.json artifact]
    FAIL --> UPLOAD
Loading

Reviews (4): Last reviewed commit: "fix: resolve merge conflicts with main" | Re-trigger Greptile

Comment on lines +28 to +30
concurrency:
group: perf-canary-${{ github.ref }}
cancel-in-progress: true

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The workflow is missing an explicit permissions: block. Other workflows in this repo (e.g. benchmark.yml) use permissions: {} to lock down the token scope. Without an explicit declaration the job inherits the repo default, which may be broader than needed for a read-only benchmark run that only writes an artifact.

Suggested change
concurrency:
group: perf-canary-${{ github.ref }}
cancel-in-progress: true
permissions: {}
concurrency:
group: perf-canary-${{ github.ref }}
cancel-in-progress: true

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added permissions: {} at workflow level (above concurrency:) to lock down the token scope, consistent with benchmark.yml and publish.yml.

Comment on lines +19 to +26
paths:
- "src/extractors/**"
- "src/domain/graph/**"
- "crates/**"
- "scripts/benchmark.ts"
- "scripts/incremental-benchmark.ts"
- "scripts/lib/bench-config.ts"
- "scripts/lib/fork-engine.ts"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The path filter doesn't include tests/benchmarks/regression-guard.test.ts or scripts/update-incremental-report.ts. This very PR modifies the former to introduce BENCH_CANARY mode, but the canary won't fire for that change because no src/extractors/** / crates/** paths were touched. A future PR that adjusts the canary thresholds or the report-update logic will silently bypass this gate: the full pre-publish-benchmark runs but without BENCH_CANARY=1, so the BENCH_CANARY code path is never exercised in CI.

Suggested change
paths:
- "src/extractors/**"
- "src/domain/graph/**"
- "crates/**"
- "scripts/benchmark.ts"
- "scripts/incremental-benchmark.ts"
- "scripts/lib/bench-config.ts"
- "scripts/lib/fork-engine.ts"
paths:
- "src/extractors/**"
- "src/domain/graph/**"
- "crates/**"
- "scripts/benchmark.ts"
- "scripts/incremental-benchmark.ts"
- "scripts/lib/bench-config.ts"
- "scripts/lib/fork-engine.ts"
- "scripts/update-incremental-report.ts"
- "tests/benchmarks/regression-guard.test.ts"

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added both scripts/update-incremental-report.ts and tests/benchmarks/regression-guard.test.ts to the path filter. PRs that only touch the canary infrastructure will now correctly trigger the canary run.

Comment on lines +99 to +103
- name: Regression guard (50% threshold)
env:
RUN_REGRESSION_GUARD: "1"
BENCH_CANARY: "1"
run: npm run test:regression-guard

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Canary must be a required status check to act as a gate

The workflow is a separate file and is not included in ci.yml's ci-pipeline aggregator (needs: [lint, native-host-build, test, ...]). For it to actually block merges on regressing PRs, perf-canary / Perf canary (incremental tiers) needs to be added as a required status check in branch protection rules. Without that step the canary is advisory only — a failing run won't prevent the PR from being merged.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — making the canary a required status check in branch protection rules is a repo settings change, not a code change. Filed as a follow-up action for the repo admin to add perf-canary / Perf canary (incremental tiers) to required status checks after this PR merges.

… excess fetch-depth

- Add `permissions: {}` to lock down token scope for the read-only canary
- Add `scripts/update-incremental-report.ts` and
  `tests/benchmarks/regression-guard.test.ts` to path filter so PRs that
  modify the canary machinery itself also trigger the canary
- Remove `fetch-depth: 0` (full history not needed; canary compares against
  committed benchmark data, not git refs)
- Align `node-version: 22` with the integer format used in ci.yml
@carlos-alm

Copy link
Copy Markdown
Contributor Author

Addressed all review feedback:

  • permissions block (Greptile P2): added permissions: {} at workflow level
  • path filter gaps (Greptile P2): added scripts/update-incremental-report.ts and tests/benchmarks/regression-guard.test.ts so canary triggers on changes to its own machinery
  • fetch-depth: 0 (Claude review): removed — not needed since canary compares against committed benchmark data, not git history
  • node-version: "22" (Claude review): aligned to integer 22 matching ci.yml convention
  • required status check (Greptile P2): acknowledged as a repo settings action after merge; not a code change

@carlos-alm

Copy link
Copy Markdown
Contributor Author

@greptileai

Base automatically changed from fix/benchmark-warmup-1440 to main June 13, 2026 09:14
@carlos-alm carlos-alm merged commit 65bd28a into main Jun 13, 2026
23 checks passed
@carlos-alm carlos-alm deleted the fix/bench-pr-canary-1433 branch June 13, 2026 21:05
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 13, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Benchmarks only run at publish — perf regressions accumulate invisibly between releases

1 participant